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Abstract — The canonical problem of solving a system of linear 
equations arises in numerous contexts in information tlieory, com- 
munication theory, and related fields. In this contribution, we de- 
velop a solution based upon Gaussian belief propagation (GaBP) 
that does not involve direct matrix inversion. The iterative 
nature of our approach allows for a distributed message-passing 
implementation of the solution algorithm. We also address some 
properties of the GaBP solver, including convergence, exactness, 
its max-product version and relation to classical solution methods. 
The appUcation example of decorrelation in CDMA is used to 
demonstrate the faster convergence rate of the proposed solver 
in comparison to conventional Unear-algebraic iterative solution 
methods. 

I. Problem Formulation and Introduction 

Solving a system of linear equations Ax = b is one of 
the most fundamental problems in algebra, with countless 
applications in the mathematical sciences and engineering. 
Given the observation vector b G G N*, and the data 

matrix A G M"^", a unique solution, x = x* G M", exists if 
and only if the data matrix A is full rank. In this contribution 
we concentrate on the popular case where the data matrices, A, 
are also symmetric {e.g., as in correlation matrices). Thus, 
assuming a nonsingular symmetric matrix A, the system of 
equations can be solved either directly or in an iterative 
manner. Direct matrix inversion methods, such as Gaussian 
eUmination (LU factorization, [1]-Ch. 3) or band Cholesky 
factorization ( [1]-Ch. 4), find the solution with a finite number 
of operations, typically, for a dense n x n matrix, on the order 
of n^. The former is particularly effective for systems with 
unstructured dense data matrices, while the latter is typically 
used for structured dense systems. 

Iterative methods [2] are inherently simpler, requiring only 
additions and multiplications, and have the further advantage 
that they can exploit the sparsity of the matrix A to reduce the 
computational complexity as well as the algorithmic storage 
requirements [3]. By comparison, for large, sparse and amor- 
phous data matrices, the direct methods are impractical due to 
the need for excessive row reordering operations. The main 
drawback of the iterative approaches is that, under certain 
conditions, they converge only asymptotically to the exact 
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solution X* [2J. Thus, there is the risk that they may converge 
slowly, or not at all. In practice, however, it has been found 
that they often converge to the exact solution or a good 
approximation after a relatively small number of iterations. 

A powerful and efficient iterative algorithm, belief propa- 
gation (BP) [4], also known as the sum-product algorithm, 
has been very successfully used to solve, either exactly or 
approximately, inference problems in probabilistic graphical 
models [5]. In this paper, we reformulate the general problem 
of solving a linear system of algebraic equations as a prob- 
abilistic inference problem on a suitably-defined graph. We 
beheve that this is the first time that an exphcit connection 
between these two ubiquitous problems has been estabUshed. 
As an important consequence, we demonstrate that Gaussian 
BP (GaBP) provides an efficient, distributed approach to solv- 
ing a linear system that circumvents the potentially complex 
operation of direct matrix inversion. 

We shall use the following notations. The operator { j^ 
denotes a vector or matrix transpose, the matrix I„ is a n x n 
identity matrix, while the symbols {•}, and denote entries 
of a vector and matrix, respectively. 

11. THE GaBP Solver 
A. From Linear Algebra to Probabilistic Inference 

We begin our derivation by defining an undirected graphical 
model (i.e. , a Markov random field), Q, corresponding to the 
linear system of equations. Specifically, let 5 = {X,£), where 
A" is a set of nodes that are in one-to-one correspondence with 
the linear system's variables x = {.ri, .... .J ,,}"^, and where 
f is a set of undirected edges determined by the non-zero 
entries of the (symmetric) matrix A. Using this graph, we can 
translate the problem of solving the linear system from the 
algebraic domain to the domain of probabilistic inference, as 
stated in the following theorem. 

Proposition 1 (Solution and inference): The computation 
of the solution vector x* is identical to the inference of the 
vector of marginal means p = {pi, . . . , Hn} over the graph 
Q with the associated joint Gaussian probabiUty density 
function p(x) ~ A/'(/u = A'^b, A"!). 

Proof: Another way of solving the set of linear equations 
Ax — b = is to represent it by using a quadratic form 
q'(x) = x^Ax/2 — b^x. As the matrix A is symmetric, the 



derivative of the quadratic form w.r.t. the vector x is given 
by the vector dq/dx ~ Ax — b. Thus equating dq/dx ~ 
gives the stationary point x*, which is nothing but the desired 
solution to Ax = b. Next, one can define the following joint 
Gaussian probability density function 

p{x) = Z-^ exp ( - g(x)) = Z'^ cxp (-x^Ax/2 + b^x), 

(1) 

where Z is a distribution normalization factor Denoting the 
vector ^ = A^^b, the Gaussian density function can be 
rewritten as 

p(x) = Z"^exp(^'^A/i/2) 

X exp (— x"^Ax/2 + /i"^Ax — /x"^A^/2) 

^ riexp(-(x-/x)^A(x-/.)/2) 

= AA(Ai,A-i), (2) 

where the new normalization factor C — Z cxp (— pt^A/i/2). 
It follows that the target solution x* = A^^b is equal to /i = 
A^^b, the mean vector of the distribution p(x), as defined 
above ([T]i. Hence, in order to solve the system of linear equa- 
tions we need to infer the marginal densities, which must also 
be Gaussian, ^(xi) - 7V(Ai, = {A-^b},, ^ = {A-i},i), 
where [li and Pi are the marginal mean and inverse variance 
(sometimes called the precision), respectively. ■ 
According to Proposition [T] solving a deterministic vector- 
matrix linear equation translates to solving an inference prob- 
lem in the corresponding graph. The move to the probabilistic 
domain calls for the utilization of BP as an efficient inference 
engine. 

B. Belief Propagation in Graphical Model 

Belief propagation (BP) is equivalent to applying Pearl's lo- 
cal message-passing algorithm [4], originally derived for exact 
inference in trees, to a general graph even if it contains cycles 
(loops). BP has been found to have outstanding empirical 
success in many applications, e.g. , in decoding Turbo codes 
and low-density parity-check (LDPC) codes. The excellent 
performance of BP in these applications may be attributed 
to the sparsity of the graphs, which ensures that cycles in the 
graph are long, and inference may be performed as if the graph 
were a tree. 

Given the data matrix A and the observation vector b, one 
can write explicitly the Gaussian density function, p(x) (j2]l, 
and its corresponding graph Q consisting of edge potentials 
('compatibility functions') '^ij and self potentials ('evidence') 
These graph potentials are simply determined according 
to the following pairwise factorization of the Gaussian func- 
tion (fT) 

n 

), (3) 

resulting in -(/'^^(xi, Xj) = cxp(— XiAijXj) and 
i>i{xi) = &xYi{biXi — Aiix'lj'i). Note that by 
completing the square, one can observe that 
(i>i{xi) oc7V(/i,, = &j/A,,,i^7^ = Ar^). The graph topology 



is specified by the structure of the matrix A, i.e. , the edges 
set {i, j} includes all non-zero entries of A for which i > j. 

The BP algorithm functions by passing real- valued mes- 
sages across edges in the graph and consists of two compu- 
tational rules, namely the 'sum-product rule' and the 'product 
rule'. In contrast to typical applications of BP in coding 
theory [6], our graphical representation resembles a pairwise 
Markov random field [5] with a single type of propagating 
message, rather than a factor graph [7] with two different 
types of messages, originating from either the variable node 
or the factor node. Furthermore, in most graphical model 
representations used in the information theory literature the 
graph nodes are assigned discrete values, while in this con- 
tribution we deal with nodes corresponding to continuous 
variables. Thus, for a graph Q composed of potentials tpij 
and (f)i as previously defined, the conventional sum-product 
rule becomes an integral -product rule [8] and the message 
mij{xj), sent from node i to node j over their shared edge 
on the graph, is given by 

mij{xj) (X / ipij{xi,Xj)(j)i{xi) Y[ rnki{xi)dxi. (4) 

The marginals are computed (as usual) according to the 
product rule 

p{xi) = a(j)i{xi) Y[ nikiixi), (5) 

feGN(i) 

where the scalar a is a normalization constant. The set of 
graph nodes N(i) denotes the set of all the nodes neighboring 
the ith node. The set N(i)\j excludes the node j from N(i). 

C. The Gaussian BP Algorithm 

Gaussian BP is a special case of continuous BP, where the 
underlying distribution is Gaussian. Now, we derive the Gaus- 
sian BP update rules by substituting Gaussian distributions into 
the continuous BP update equations (|4])-(|5]l. Before describing 
the inference algorithm performed over the graphical model, 
we make the elementary but very useful observation that the 
product of Gaussian densities over a common variable is, up 
to a constant factor, also a Gaussian density. 

Lemma 2 (Product of Gaussians): Let fi{x) and f2{x) 
be the probability density functions of a Gaussian 
random variable with two possible densities M{^i,Pi^) 
and A/^(/i2, ^2"^), respectively. Then their product, 
f{x) — fi{x)f2{x) is, up to a constant factor, the probability 
density function of a Gaussian random variable with 
distribution J\f{fi, P^^), where 

p-' = {Pi+P2r\ (6) 

/i = p-\Pl^il + P2^l2)■ (7) 

Proof The proof of this lemma is straightforward, thus 
omitted. ■ 
Fig. 1. plots a portion of a certain graph, describing the 
neighborhood of node i. Each node (empty circle) is associated 
with a variable and self potential 0, which is a function of 
this variable, while edges are identified with the pairwise 
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Fig. 1. Graphical model: The neighborhood of node i. 



(symmetric) potentials ij}. Messages propagate along the edges 
in both directions. The messages relevant for the computation 
of message rriij are shown in Fig. 1.). Looking at the right 
hand side of the integral-product rule node i needs to first 
calculate the product of all incoming messages, except for the 
message coming from node j. Recall that since p(x) is jointly 
Gaussian, the factorized self potentials 4>i{xi) cx Afdia, P^^^) 
and similarly all messages mki{xi) oc J^ilJ-ki, P^i^) are of 
Gaussian form as well. 

As the terms in the product of the incoming messages and 
the self potential in the integral -product rule Q are all a 
function of the same variable, Xi (associated with the node 
i), then, according to the multivariate extension of Lemma |2] 
(f>i{xi) nfeeN(i)\j ^ki{xi) is proportional to a certain Gaussian 
distribution, Af{iii\j, P^j). Applying the multivariate version 
of the product precision expression in (j6]l, the update rule for 
the inverse variance is given by (over-braces denote the origin 
of each of the terms) 



Pa 



E 

fceN(i)\j 



rrikiixi) 



(8) 



where Pa = An is the inverse variance a-priori associated 
with node i, via the precision of (j)i{xi), and Pki are the inverse 
variances of the messages mki{xi). Similarly, using (j7| for the 
multivariate case, we can calculate the mean 




(9) 



where = bi/Au is the mean of the self potential and 
are the means of the incoming messages. 

Next, we calculate the remaining terms of the mes- 
sage mij{xj), including the integration over Xi. After 
some algebraic manipulation, using the Gaussian integral 
exp (— ax^ + bx)dx — y/njaexp (6^/4a), we find that 
the messages mij{xj) are proportional to a normal distribution 
with precision and mean 



P- 



-PZ^A 



(10) 

(11) 



These two scalars represent the messages propagated in the 
GaBP-based algorithm. 

Finally, computing the product rule (|5]l is similar to the 
calculation of the previous product and the resulting mean (|9]) 



and precision (j8]l, but including all incoming messages. The 
marginals are inferred by normalizing the result of this prod- 
uct. Thus, the marginals are found to be Gaussian probability 
density functions Pj^^) with precision and mean 



Pi Pa ^" ^ ^ Pki : 

keNii) 
<t>i{xi) mki{xi) 



(12) 



^J^^i 



feeN(i) 



respectively. 

For a dense data matrix, the number of messages passed on 
the graph can be reduced from 0{n^) {i.e. , twice the number 
of edges) down to 0{n) messages per iteration round by using 
a similar construction to Bickson etaJ. [9]; Instead of sending 
a unique message composed of the pair of Hij and Pij from 
node i to node j, a node broadcasts aggregated sums to all 
its neighbors, and consequently each node can retrieve locally 
Pi\j ([8jl and (|9]) from the aggregated sums 

fcGN(i) 

= P~\Puf^it+ Pk^f^k^) (15) 

A:eN(i) 



by means of a subtraction 

Pi\j = Pi ^ Pjii 

fJ'i\j ~ i^i ^ P^\^j PjifJ-ji 



(16) 
(17) 



The following pseudo-code summarizes the GaBP solver al- 
gorithm. 

Algorithm 1 (GaBP solver): 



1 . 


Initialize : 


/ 


Set the neighborhood N(z) to include 








/ i such that A,,, / 0. 






/ 


Fix the scalars 








Pu = A,i and fii, = h/A,,, Vi. 






/ 


Set the initial i ^ N(i) broadcast messages 








Pi = and fii = 0. 






/ 


Set the initial fc^ i.fceN(i) internal scalars 








Pi., = and = 0. 






/ 


Set a convergence threshold e. 


2 . 


Iterate : 


/ 


Broadcast the aggregated sum messages 








a = p„+i:kn(.)-P'=.. 








/I, = p. \p„fi„ + PfaMi,), Vi 








(under chosen scheduling) . 






/ 


Compute the i ^ j^i € N(j) internal scalars 








P.,=-Al/(P,-P,.], 








= (P.,.I. - P„(i,.)M,:.,. 


3 . 


Check: 


/ 


If the internal scalars P^j and fi,^j did not 








converge (w.r.t. e) , return to Step 2. 






/ 


Else, continue to Step 4. 


4 . 


Infer : 


/ 


Compute the marginal means 








= (P../J.,. + EteNC) Pi.mi.i)/{P., + T.km-} -P") = i^" 






(/ 


Optionally compute the marginal precisions 










5 . 


Solve: 


/ 


Find the solution 








X* — ^i, Vi. 



D. Max-Product Rule 

A well-known alternative to the sum-product BP algorithm 
is the max-product (a.k.a. min-sum) algorithm [5]. In this 



variant of BP, a maximization operation is peri'ormed rather 
than marginahzation, i.e. , variables are eliminated by taking 
maxima instead of sums. For trellis trees {e.g., graphical 
representation of convolutional codes or ISI channels), the 
conventional sum-product BP algorithm boils down to per- 
forming the BCJR algorithm, resulting in the most probable 
symbol, while its max-product counterpart is equivalent to the 
Viterbi algorithm, thus inferring the most probable sequence 
of symbols [7]. 

In order to derive the max-product version of the proposed 
GaBP solver, the integral(sum)-product rule Q is replaced by 
a new rule 

m.ij{xj) (X aigma.xil:ij{xi,Xj)(j)i{xi) Y[ mkiixi). (IS) 

feGN(j)\i 

Computing m.ij{xj) according to this max-product rule, one 
gets (the exact derivation is omitted) 

(19) 

which is identical to the messages derived for the sum-product 
case (10i-(lli. Thus interestingly, as opposed to ordinary 



(discrete) BP, the following property of the GaBP solver 
emerges. 



Corollary 3 (Max-product): The max-product ( 18 i and 
sum-product Q versions of the GaBP solver are identical. 

III. Convergence and Exactness 

In ordinary BP, convergence does not guarantee exactness 
of the inferred probabilities, unless the graph has no cycles. 
Luckily, this is not the case for the GaBP solver. Its un- 
derlying Gaussian nature yields a direct connection between 
convergence and exact inference. Moreover, in contrast to BP, 
the convergence of GaBP is not limited to acyclic or sparse 
graphs and can occur even for dense (fully-connected) graphs, 
adhering to certain rules that we now discuss. We can use 
results from the literature on probabilistic inference in graph- 
ical models [8], [10], [11] to determine the convergence and 
exactness properties of the GaBP solver The following two 
theorems establish sufficient conditions under which GaBP is 
guaranteed to converge to the exact marginal means. 

Theorem 4: [8, Claim 4] If the matrix A is strictly di- 
agonally dominant (i.e., \An\ > X^j^j l^ul;^*)' th^n GaBP 
converges and the marginal means converge to the true means. 

This sufficient condition was recently relaxed to include a 
wider group of matrices. 

Theorem 5: [10, Proposition 2] If the spectral radius (i.e., 
the maximum of the absolute values of the eigenvalues) p 
of the matrix |I„ — A| sarisfies p{\In — A|) < 1, then GaBP 
converges and the marginal means converge to the true means. 

There are many examples of linear systems that violate these 
conditions for which the GaBP solver nevertheless converges 
to the exact solution. In particular, if the graph corresponding 
to the system is acyclic (i.e. , a tree), GaBP yields the exact 
marginal means (and even marginal variances), regardless of 
the value of the spectral radius [8]. 



IV. Relation to Classical Solution Methods 

It can be shown (see also Plarre and Kumar [12]) that the 
GaBP solver (Algorithm [TJ for a system of linear equations 
represented by a tree graph is identical to the renowned direct 
method of Gaussian elimination (a.k.a. LU factorization, [1]). 
The interesting relation to classical iterative solution meth- 
ods [2] is revealed via the following proposition. 

Proposition 6 (Jacobi and GaBP solvers): 
The GaBP solver (Algorithm [T]) 

1) with inverse variance messages arbitrarily set to zero, 
i.e., P,j^O,teN{j),yj; 

2) incorporating the message received from node j when 
computing the message to be sent from node i to node 
j, i.e., replacing k £ N(z)\j with k £ N(i); 

is identical to the Jacobi iterative method. 

Proof: Arbitrarily setting the precisions to zero, we get 
in correspondence to the above derivation. 



— P — A- 

J 7,7, 



(20) 

Pijl^ij = ^Ajl^i\j7 (21) 

Mi = - ^ AktPk\i)- (22) 

/ceN(i) 



Note that the inverse relation between Ptj and Pi\j ( 10 1 is no 
longer valid in this case. Now, we rewrite the mean p^^j (j9]l 
without excluding the information from node j. 



M*\j = 



^feiMfc\i)- 

fcGN(i) 



(23) 



Note that pi\j = pi, hence the inferred marginal mean pi (22 1 
can be rewritten as 



AkiPk) 



(24) 



where the expression for all neighbors of node i is replaced 
by the redundant, yet identical, expression k ^ i. This fixed- 
point iteration is identical to the element-wise expression 
of the Jacobi method [2], concluding the proof. ■ 
Now, the Gauss-Seidel (GS) method can be viewed as a 
'serial scheduling' version of the Jacobi method; thus, based 
on Proposition [6j it can be derived also as an instance of the 
serial (message-passing) GaBP solver. Next, since successive 
over-relaxation (SOR) is nothing but a GS method averaged 
over two consecutive iterations, SOR can be obtained as a 
serial GaBP solver with 'damping' operation [13]. 

V. Application Example: Linear Detection 

We examine the implementation of a decorrelator linear 
detector in a CDMA system with spreading codes based 
upon Gold sequences of length N = 7. Two system setups 
are simulated, corresponding to n = 3 and n = 4 users. 
The decorrelator detector, a member of the family of linear 
detectors, solves a system of linear equations. Ax = b, where 
the matrix A is equal to the n x n correlation matrix R, and 
the observation vector b is identical to the n-length CDMA 



Algorithm 


Iterations t (Rn=3) 


Iterations t (Ti.n=i) 


Jacobi 


111 


24 


GS 


26 


26 


Parallel GaBP 


23 


24 


Optimal SOR 


17 


14 


Serial GaBP 


16 


13 


Jacobi+Steffensen 


59 




Parallel GaBP+Steffensen 


13 


13 


Serial GaBP+Steffensen 


9 


7 



TABLE I 
Convergence rate. 



channel output vector y. Thus, the vector of decorrelator deci- 
sions is determined by taking the signum (for binary signaling) 
of the vector A^^b — R^^y. Note that R„=3 and R,i=4 in 
this case are not strictly diagonally dominant, but their spectral 
radii are less than unity, since p(|l3 — R„=3|) — 0.9008 < 1 
and p{\l4 - R„=4|) = 0.8747 < 1, respectively. In all of the 
experiments, we assumed the (noisy) output sample was the 
all-ones vector 

Table |l] compares the proposed GaBP solver with stan- 
dard iterative solution methods [2], previously employed for 
CDMA multiuser detection (MUD). Specifically, MUD algo- 
rithms based on the algorithms of Jacobi, GS and (optimally 
weighted) SOR were investigated [14]-[16]. Table [I] lists 
the convergence rates for the two Gold code-based CDMA 
settings. Convergence is identified and declared when the 
differences in all the iterated values are less than 10^^. We 
see that, in comparison with the previously proposed detectors 
based upon the Jacobi and GS algorithms, the serial (asyn- 
chronous) message-passing GaBP detector converges more 
rapidly for both n = 3 and n — 4 and achieves the best 
overall convergence rate, surpassing even the optimal SOR- 
based detector. Further speed-up of the GaBP solver can be 
achieved by adopting known acceleration techniques from 
linear algebra. Table [l] demonstrates the speed-up of the GaBP 
solver obtained by using such an acceleration method, termed 
Steffensen's iterations [17], in comparison with the accelerated 
Jacobi algorithm (diverged for the 4 users setup). We remark 
that this is the first time such an acceleration method is ex- 
amined within the framework of message-passing algorithms 
and that the region of convergence of the accelerated GaBP 
solver remains unchanged. 

The convergence contours for the Jacobi and parallel (syn- 
chronous) GaBP solvers for the case of 3 users are plotted in 
the space of {xi,X2,X3} in Fig. |2] As expected, the Jacobi 
algorithm converges in zigzags directly towards the fixed point. 
It is interesting to note that the GaBP solver's convergence is 
in a spiral shape, hinting that despite the overall convergence 
improvement, performance improvement is not guaranteed 
in successive iteration rounds. Further results and elaborate 
discussion on the application of GaBP specifically to linear 
MUD may be found in recent contributions [18], [19]. 
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Fig. 2. Convergence visualization. 
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